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Abstract 

Wind direction plays an important role in the spread of pollutant levels over a 
geographical region. We discuss how to include wind directional information in the 
covariance function of spatial models. We follow the spatial convolution approach 
initially proposed by Higdon and co-authors, wherein a spatial process is described 
by a convolution between a smoothing kernel and a white noise process. We pro- 
pose two different ways of accounting for wind direction in the kernel function. 
For comparison purposes, we also consider a more flexible kernel parametrization, 
that makes use of latent processes which vary smoothly across the region. Inference 
procedure follows the Bayesian paradigm, and uncertainty about parameter estima- 
tion is naturally accounted for when performing spatial interpolation. We analyze 
ozone levels observed at a monitoring network in the Northeast of the USA. Sam- 
ples from the posterior distribution under our proposed models are obtained much 
faster when compared to the kernel based on latent processes. Our models provide 
better results, in terms of model fitting and spatial interpolation, when compared 
to simple isotropic and geometrical anisotropic models. Despite the small number 
of parameters, our proposed models provide fits which are comparable to those 
obtained under the kernel based on latent processes. 
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1 Introduction 



1.1 Motivation 

As described in the homepage of the United States Environmental Protection Agency 
(EPA) (http : //www. epa.gov/airnow/airaware/day2-detail .html) "large weather sys- 
tems dictate the predominant wind direction, which can have a considerable effect on the 
quality of air in a specific city or region. Air quality can worsen in a particular region 
if the wind is blowing from a region that contains numerous sources of pollution. If the 
winds are coming from areas with little or no pollution, they can make your air quality 
better. Very light winds or no wind, such as those in a strong high pressure system, can 
be a problem for urban areas, because all the pollution that a city creates stays in one 
place." 

Usually, air pollution data are observed at fixed points (monitoring stations) of a 
region of interest. It is common practice to model these data using spatial models (see 
e.g. Banerjee et al. (2004)), which allow for spatial correlation among observations at 
different locations. More specifically, in spatial statistics, one usually assumes that ob- 
servations are partial realizations of a stochastic process {Z(s),s G G}, with G G M. c , 
where commonly C — 2, and the components of the location vector s are geographical 
coordinates. Frequently, it is assumed that Z(-) follows a Gaussian process (GP) with a 
stationary and isotropic covariance structure; that is the spatial distribution is unchanged 
under rotation about the origin, translation of the origin of the index set. 

When modelling air pollutant data it is expected from the quote above that wind 
direction has a local effect in the process of interest and the assumption of stationarity 
and isotropy seem unreaslitic. Moreover, in the spatial setting, the usual aim is to 
make spatial interpolation to unobserved locations of interest, based on observed values 
at monitored locations. This interpolation is heavily based on the specification of the 
mean and covariance structures of the GP. In environmental problems the assumption 
of stationary covariance structures is commonly violated due to local influences in the 
covariance structure of the process. 

In this paper, we aim at proposing non-stationary spatial models that account for wind 
direction in the covariance structure of air pollutant data. Winds are vector quantities 
and can be split into orthogonal components. Usually, wind direction is reported through 
two horizontal components, the u component which represents the east-west direction, 
and the v component which represents the north-south direction. The challenge is how to 
account for this directional information when common correlation functions are usually 
functions of the norm of a vector. It is expected that if the wind at two arbitrary locations 
is blowing in the same direction, the process at the two locations should be more highly 
correlated than the process at locations which have winds blowing in different directions. 
We propose two different ways of accounting for wind direction in the covariance function 
of a spatial process. 

In order to show the ability of our proposed models in capturing the influence of wind 
direction on the covariance structure of air pollutant data, we analyze levels of ozone 
observed at a particular time of a day, at monitoring locations located in the northeast 



of the USA. As described in the AIRNow website (http://www.airnow.gov), ozone forms 
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near ground level when air pollutants (emitted by sources such as cars, power plants, and 
chemical plants) react chemically in the presence of sunlight. EPA maintains the Clean 
Air Status and Trends Network (CASTNET) which is "a national air quality monitoring 
network designed to provide data to assess trends in air quality, atmospheric deposition, 
and ecological effects due to changes in air pollutant emissions" . The dataset we analyze 
here was downloaded from the site 

http : // j ava . epa . gov/castnet /epa_j sp/ prepackageddata . j sp/ #ozone[ We consider 
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48 monitoring stations, all of them containing information about ozone and wind 
measurements. Figure [T] shows the locations of the monitoring sites and the observed 
data. 




-85 -75 
Longitude 



Figure 1: Observed values of ozone (solid squares in grayscale), and respective wind direc- 
tion (arrows) for December, 11th, 2008, 3pm. For predictive purposes, circled locations 
are left out from the inference procedure performed in Section |4j 

There are many different alternatives available in the literature to model flexible 
spatial covariance structures. In the last decade many of them have focused on the fact 
that a Gaussian process can be described as a convolution between a white noise process 
and a kernel function. Next Subsection reviews these approaches in more detail, as our 
proposal in Section [2] is based on the spatial convolution approach. 

1.2 A brief review of spatial convolution models 

A stochastic process can be constructed by convolving a white noise process W(-) 
with a smoothing kernel k(-) by defining the process as Y(s) = j G k(h)W(h)dh. Higdon 
et al. (1999) propose non-stationary spatial models based on this framework by allowing 
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the kernel to vary smoothly with location, that is, they propose 

Y(s) = J k s (h) W (h) dh. 



The kernel k s (.) is an arbitrary function; it is common practice to define it based on 
density functions or nonnegative functions that decay monotonically from their mode 
(Paciorek and Schervish 2006). Paciorek and Schervish (2006) show that when k s (.) is 
a Gaussian kernel centered at s, and with covariance matrix E(s), the resultant nonsta- 
tionary covariance structure of the process Y(.) is equal to 



Cov(Y( Si ),Y( Sj )) = <7 2 |E( Si )| 1/4 |E( Sj 



1 1/4 



-1/2 



exp(Qij), for s { , Sj G G, 



where Qij 



(si — Sj). Higdon et al. (1999) propose to model the 



uj — J 3) \ 2 

covariance matrix E(sj) through the connection between a bivariate normal distribution 
and its one standard deviation ellipse, such that 



E(s) =R(s) 



2ir 











2ir 2 



R(s), (2) 



where ^(s)!! — V'i( s ) 2 + ^(s) 2 , and the rotation matrices, R(s), are given by 



R(s) 



cos(cu(s)) sin(w(s)) 
— sin(cu(s)) cos(a;(s)) 



with oj (s) = arctan (ip 2 (s) /ipi (s)). Higdon et al. (1999) let ip(s) = (tpi(s), ip 2 (s))' be 
the focal points, such that —ip(s) = (— ipi(s), — ^ 2 (s)), Vs G G, define a set of ellipses 
centered at the origin, all of them with a fixed area e, which is common to all locations. 
They suggest to fix this area after some exploratory data analysis. 

Because this resultant covariance structure is infinitely differentiable, Paciorek and 
Schervish (2006) generalize the kernel convolution approach by introducing a new class 
of nonstationary covariance functions. This class includes a nonstationary version of the 
Matern covariance function, which is of the form 



C(si, sj) = a' 



1 



r(z/)2 i 



-|E( S ,)|-V4| E(Sj) |-V4 



E(si) + E(s, 



-1/2 



2y^Q~Y k v (2y/UQ^j, (3) 



with as above, v > is the shape parameter, and k v {.) is the modified Bessel 
function of the second kind of order v. Again the main issue is how to model E(sj). 
Paciorek and Schervish (2006) model E(sj) using the spectral decomposition, by defining 
E(sj) = FiAjTj, where Aj is a diagonal matrix of eigenvalues, Ai(sj) and A 2 (sj), and Tj 
is an eigenvector matrix. They impose some restriction on the elements of Tj and Aj 
to limit the number of hyperparameters and improve mixing of the chains. Similar to 
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Higdon et al. (1999), they make use of latent spatial processes to allow F{ and A« to vary 
smoothly across the spatial region. 

Note that when modelling S(sj) following either Higdon et al. (1999), or Paciorek 
and Schervish (2006), the number of parameters to be estimated increases linearly with 
the number of observed locations. This is because the covariance matrix £(sj) modelled 
either as in equation (pi), or as in d3l), is built through latent Gaussian processes leading 
to a great number of parameters to be estimated. 

The advantage of specifying a Gaussian kernel in equation 0, or the nonstationary 
version of the Matern covariance function, is that the resultant covariance function has a 
closed form. If a different kernel function is used, and it does not result in a closed form 
of the covariance function, one can approximate the continuous model in Q through a 
finite sum by defining 

m 

Y(s) = J2ks(h l )W(h l ), (4) 
i=i 

where W(-) is a white noise process defined over a fixed grid of m points, {hi, • • • , h m }, 
overlaid on G. See Higdon (1998) for a spatio-temporal application. As pointed out 
by Higdon (2002), discretized convolutions are economical parameterizations and can 
greatly facilitate computation, since a small number of processes, W(hi),--- ,W(h m ), 
effectively control the entire spatial process Y(-), even though Y(-) may be required at 
any location in G. Besides being used to approximate the integral and to alleviate the 
computation burden associated with model fitting, the discretized convolution framework 
given in Q has been extended to allow W(-) to be a spatial dependent process itself. 
Examples of such approaches are discussed, for example, in Fuentes (2002), Lee et al. 
(2005) and Cressie and Johannesson (2006). 

Usually, the above models are estimated under the Bayesian framework, and Markov 
chain Monte Carlo (MCMC) methods are used to obtain samples from the resultant 
posterior distribution. Even under the discretized version of the convolution structure 
these models have a great number of parameters to be estimated. All of the parameters 
involved in the covariance structure of the process Y(.) do not result on a known full 
conditional posterior distribution. Usually Metropolis-Hastings steps are used to obtain 
samples from such full conditionals, and it is quite challenging to tune the variance 
of the proposal distribution in order to obtain reasonable acceptance rates and reach 
convergence of the chains. Indeed, the models above require fixing some hyperparameters 
which are problem specific. And from our experience, the algorithms seem quite sensitive 
to the choice of these hyperparameters. See Section 5.1 of Paciorek and Schervish (2006) 
for further discussion about the computational demands of such models. 

1.3 Literature review on including covariates in spatial covari- 
ance functions 

Sources of nonstationarity might be related to local influences of some covariate ef- 
fects in the covariance structure of the underlying spatial process. Therefore, it seems 
reasonable to investigate how to include covariate information in the covariance structure 
of spatial processes. 
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Ver Hoef et al. (2006) propose spatial models whose covariance structures incorporate 
flow and stream distance through the use of spatial moving averages. Cooley et al. (2007) 
use covariates (but not geographic coordinates) to model extreme precipitation. 

Calder (2008) makes use of wind measurements in the kernel convolution approach 
of Higdon (1998). However, she only uses the information of wind direction at a single 
location to fix the covariance matrix of the Gaussian kernels of the convolution. 

Schmidt et al. (2011) extend the work of Schmidt and O'Hagan (2003) by allowing a 
latent space, wherein isotropy holds, to be of dimension greater than 2; more specifically, 
geographic coordinates and covariates are used to define the axis of the latent space. 
Schmidt et al. (2011) also propose a simpler version of the higher dimensional latent 
space model, by defining a projection onto the M 2 manifold which makes use of covariate 
information in the covariance structure of the process. Schmidt and Rodriguez (2011) 
use this projection approach to describe the spatial covariance structure when modelling 
multivariate counts of fish species observed across the shores of a lake. 

Reich et al. (2011) make use of the convolution approach proposed by Fuentes (2002) 
to introduce local covariate information in the covariance structure of spatio-temporal 
processes. However, they do not discuss how to include a directional covariate in their 
model. 

The main aim of this paper is to extend the convolution approach revised in Section 
1.2 to allow the kernel function in equation ([!]) to depend on the direction the wind is 



blowing at a particular location. This significantly reduces the number of parameters 
to be estimated, while still allowing for a flexible covariance structure. We propose two 
different ways of capturing the directional behaviour of wind in the covariance structure 
of spatial processes. 

This paper is organized as follows. In Section [2j we propose two different ways of 
accounting for wind direction in the covariance structure of spatial processes induced by 
a convolution approach. Then in Section 2.3, following Paciorek and Schervish (2006) 
we propose an alternative model for S(s) in equation ([3]). The aim is to compare a more 
flexible model with those proposed in Sections 2.1 and 2.2 Inference procedure is based on 
the Bayesian paradigm and this is described in Section [3] Therein, spatial interpolation to 
unmonitored locations of interest, and model comparison are also discussed. In Section |1J 
we analyze the ozone data presented in Section [TTT] Therein we compare the performance 
of the proposed models of Section [2] with standard isotropic and geometric anisotropic 
models. Finally, Section [5] concludes. 



2 Proposed Models 

Assume that observations are a partial realization of a random process {Z(s), s G G}, 
with G G W, where usually p — 1, 2, or 3. More specifically, let 

Z(s)=fx(s) + Y(s) + e(s), (5) 

where /x(s) represents the mean structure of the process and usually is a function of 
location s, e(s) is a white noise process, such that e(s) ~ iV(0,T 2 ), Vs, and is usually 
present to capture measurement error. We assume Y(s) is independent of e(s), Vs, and 
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Y(-) represents a latent spatial process which captures any spatial structure left after 
adjusting for the effect of possible covariates included in //(•). We assume Y(-) is a 
Gaussian process described by 

Y(s) = J k StX (h)W(h)dh, for s, h e G C 1R 2 , (6) 

G 

where k S}X (h) is a spatially varying kernel function that depends not only on the location 
s but also on the observed covariate x(s). 

We now propose two different formulations for k SiX (.). We consider x(s) = (u(s), v (s))', 
where u(s) and v (s) are, respectively, the u and v components of the wind direction at 
location s. We start by presenting an alternative model for S(s) in equation Indeed, 
for our model, we denote this matrix as S(s,x) as it also depends on the wind direc- 
tion. Then we introduce a more flexible kernel function which also accounts for wind 
direction but the integral in ^ does not have an analytical solution and we resort to 
the discretized version of the convolution approach to make inference about the process 



of interest. In order to compare the proposed models of Sections |2.1| and |2.2[ Section 
|2.3| proposes an alternative parameterization to the one used by Paciorek and Schervish 
(2006) when modelling S(s) in equation ([3]). 



2.1 Accounting for wind direction in the nonstationary Matern 
covariance function 

Here we focus on the nonstationary version of the Matern covariance function and 
introduce the covariate information in the kernel matrices, which we denote as 
Let x(s) = (u(s),v(s))' be a vector representing the directional covariate at location 
s E G, with = 1. We propose that the kernel matrices of equation ^ are 

modelled as 

cos co(x(s)) sin lo(x(s)) 
— sin uj(x(s)) cos oj(x(s)) 
(7) 

where u(x(s)) = arctan ^^yj- Similar to Paciorek and Schervish (2006), each location 

s has a Gaussian kernel with mean s, but we propose the covariance kernel S(s,x) to 
vary spatially with the wind direction. The matrix T(s,x) plays the role of a rotation 
matrix, whose angle of rotation is given by the arctangent between the components of 
the wind vector at location s. This makes the Gaussian kernel at location s to coincide 
with the wind direction at s. On the other hand, the diagonal matrix A captures the 
magnitude of the major and minor axis of the ellipses associated to the contours of the 
Gaussian kernel. These are assumed fixed across the region G. Therefore, when inference 
is performed based on n monitoring locations, the estimated values of A 2 and A 2 , can be 
interpreted as an overall mean direction of the axis of the ellipses associated with the 
kernel at each location s. As this process is based on the above matrix kernel we denote 
it as a Locally Geometric Anisotropic (LGA) model. 

Although this kernel varies with location, it does so according to the variation of x(.). 
In contrast to Higdon et al. (1999) and Paciorek and Schervish (2006), who make use of 



s(s, x) = r(s, x) T Ar(s, x) 



cos lo(x(s)) — sin u(x(s)) 
sin uj(x(s)) cos oj(x(s)) 



A? 



\ 2 
A 2 
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independent latent GP, we let the directional covariate guide the matrices E(s, x), reduc- 
ing significantly the number of parameters to be estimated. Clearly, this parameterization 
of S(s,x) involves only three parameters in the specification of K(.). 

Note that both A^ and A| are positive. When performing inference about these quan- 
tities we follow the Bayesian paradigm and assign independent, non-informative, inverse 
gamma prior distributions for these parameters. 

Although this proposal is efficient in reducing the number of parameters to be esti- 
mated and accounting for some wind information, it still fails to compare the directions 
at which the winds are blowing at two different locations, Sj and Sj in G. This undesired 
feature of this kernel is depicted in the panels of Figure |2j Note that in both cases the 
coincident areas of the two ellipses are the same. Therefore, the resultant covariance 
function, conditional on the same values of Ai and A2, will produce the same values. 
However, this does not seem reasonable, as the winds in the right panel are blowing in 
opposite directions. Ideally, we would like to have a correlation function which results in 
a higher correlation for the wind information in the left panel, when compared to that 
of the right panel. 




(a) Wind vectors blowing at the same di- (b) Wind vector blowing at opposite direc- 
rection tions 



Figure 2: Illustration of the contours of the kernel matrix proposed in equation ([7]) based 
on locations with contrasting situations about the wind information (black arrows). 

In the next section we propose an alternative kernel that makes use of a quantity 
which captures how concordant two locations are in terms of the direction at which the 
wind vectors are pointing at the two locations. 

2.2 An alternative kernel based on a projection measurement 

We now propose an alternative way to account for a directional covariate in the kernel 
function k a>x (.) of equation ([6]). We start by proposing a measurement that provides 
the degree of concordance between any two vectors. This measurement is based on a 
projection of the mean wind direction between two locations. That is, we say that two 
vectors, at two distinct locations, tend to be more concordant, the more similar is the 
direction in which the respective winds are blowing. Next we propose a kernel function 
that makes use of the norm of this projection measurement. 

Projection of the average wind direction as a degree of concordance between 
vectors Let x(s) be defined as before, and define r(x(s),x( s*)) = as the 
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mean vector when one considers locations s and s* in G. The projection of the mean 
vector over the set {b x (s — s*) + s* : b G IR} (a straight line that passes through s and 
s*) is 

. (r(x(s),x(s*)),(s-s*)) 
proiJs, s ) = ., — \ / — r-^ 1 x (s - s ), 

where (01,02) represents the inner product between vectors 01 and 02- The smaller 
(greater) the angle between the mean vector and the direction between the two loca- 
tions, the greater (smaller) is \\proj x (s, s*)\\, the norm of proj x (s, s*). In other words, 
this measurement takes into account the concordance of the directional covariate and the 
direction that crosses the locations. Figure [3] shows a diagram that depicts how the pro- 
jection captures the concordance between the vectors x(s) and x(s*). In the left column, 
the measurements at locations s and s* are more or less aligned, pointing nearly to the 
same direction, whereas in the right column, the measurements are nearly perpendicular, 
pointing to different directions. It is clear from the third row of this picture how the 
concordance between the directions is reflected on the norm of the wind mean vector; 
that is, \\proj x (s, s*)\ \ is greater for the vectors on the left panel, than that obtained for 
the vectors on the right panel of the figure. 







Figure 3: Diagram with directional covariates, black arrows (first row), mean direction r 
(second row), and resultant projection, gray arrows (third row). 

Here, the kernel function is modelled as 



k s ,x (h) 



J a SyX (h) dh 



where a s>x (.) is a function that is specified below. Following the definition of the kernel 
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k SjX (.) in equation ^8j), Y(-) is a zero mean Gaussian process with covariance function 
Gov (Y (s) ,Y (s*)) = / ' k S)X {h)k s *. x *{h)dh 



G 



J a SjX (/i) a s ., x * (/t) d/i 

a 2 . G _ . = . (9) 

// (a SiX (h)) 2 dh J (a s *, x * (h)fdh 



G \ G 

Now the reason for the form of the kernel proposed in equation ^ is clear. The 
covariance function between any locations s and s* can be expressed as a product of the 
variance a 2 , by a correlation function. This decomposition of the covariance function 
is a standard approach in the geostatistical literature, see e.g. Banerjee et al. (2004). 
Therefore, a can be viewed as the standard deviation of Y(.), and the function a SjX (.) is 
related to the spatial correlation of the process. 

As previously mentioned, the kernel function is any nonnegative function which usu- 
ally decays monotonically from its mode. The motivation for the way we model a SjX (.) 
comes from the exponential correlation function. We assume 



<X,x(h) = 1 ^ \ ^+<t>*\\proMh)\\) ' lf S ^ h (10) 
' \ 1, if 8 = h. 

In order to guarantee an exponential decay of the kernel we assume 0i > and 02 > 0. 
This kernel decays exponentially with the ratio between the Euclidean distance between 
s and h, and the norm of the projection of the average wind direction. The influence 
of the different components is captured by two parameters, 0i and 02- In particular, 
0i measures the effect of the Euclidean distance, and 02 the effect of the directional 



covariate in the correlation structure of Y{.). From equation (10) we see that isotropy is 
approximately attained when 02 — > 0. We denote a process based on this kernel a s>x (.) 
as a Projection Model. 

In Section |4| when fitting this model to the ozone data, we assign independent, non- 
informative, inverse gamma prior distributions for 0i and 02- 



The kernel in (10) does not provide a closed form of the covariance structure of 
Y(.). For this reason, we resort to the discretized version of this process and assume 
Y( s ) = YmLi k s ,x(hi)W(hi). Therefore, all the integrals in equation fcfy are substituted by 



their discretized versions. Note that similarly to the kernel parameterization proposed in 



Section 2.1, the number of parameters to be estimated associated to Y(.) under equation 



(10), are significantly smaller when compared to the general approaches reviewed in 



Section |1.2| Because of the reduced number of parameters to be estimated, when fitting 
this model in the analysis presented in Section |4j we consider the monitoring locations as 
the grid points to approximate the integrals of equation We discuss the selection of 
the grid points in further detail in Section [5} Section A of the Supplementary Materials 
depicts the behaviour of the proposed correlations functions for different values of the 
parameters. 
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2.3 



An alternative model for E(s) of the nonstationary Matern 
covariance function 



Sections 2A_ and 2.2| discuss different ways of accounting for wind direction in the 
covariance structure of Y(.) resulting in nonstationary covariance functions. It is reason- 
able to compare this approach with those proposed by Higdon et al. (1999) or Paciorek 
and Schervish (2006), which are able to capture more flexible covariance structures. In 
Section [3] we tried fitting both models to the ozone data of Section |1.1[ but did not get 
convergence of the chains. Both models heavily depend on pre-fixed constants; and, from 
our experience, the algorithms seem to be quite sensitive to the choice of these constants. 

Following Paciorek and Schervish (2006), we construct each S(s) in equation (|3| using 
the spectral decomposition, that is, we assume E (s) = T (s) T A (s) T (s). The matrix T(s) 

is a diagonal matrix of eigenvalues, A (s) = A 1 v ' 



and Schervish (2006), we assume T (s) 



. Different from Paciorek 

A 2 (s) 
cos (0(a)) sin(0(s)) 
-sin (0(a)) cos(0(s)) 
The kernel matrices are expected to vary smoothly across the spatial domain. In our 
proposal this is guaranteed through the prior specification of Xj(s), for j = 1, 2, and 9(s). 
We assume logAj(.) follow independent Gaussian processes, with mean /i^, variance 
a\ and a squared exponential covariance function, that is, Cov(log A,-(s), log XJs*)) = 

ff : exp /_ M n. Next , wemodeie(s)as 



7T 



000= n*(7(*)), 



where $(.) denotes the cumulative distribution function of the standard normal distribu- 
tion. We assume j(s) follows a Gaussian process with mean /i 7 and a squared exponential 

covariance function, that is, Cov( , y(s), r y(s*)) = cx 7 exp 



Note that 9(s) varies smoothly across the region induced by the GP for j(s). Under 
the parametrization in equation (11), 9(s) 6 [0,7r/2]. This restriction on the values of 



the rotation angles is imposed to avoid possible unidentifiability problems due to the 
specification of T(s). 

When fitting this model in Section |4| we assume the following prior specification: 
independent zero mean normal prior distributions, with large variance, for and fi\ 2 , 
and a standard normal prior distribution for /x 7 . For the decay parameters <p\ and 7 
we assign an inverse gamma prior distribution with parameters fixed based on the idea 
that the practical range (when the correlation is equal to 0.05) is reached at half of 
the maximum distance between geographical locations and the prior variance is fixed at 
some relatively large value. For the variance parameters, a\ and <r 7 , we assign an inverse 
gamma prior distribution with mean and variance equal to 1. 
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3 Inference procedure 



Assuming the model proposed in equation and letting z = (z(si), ■ ■ ■ , z(s n ))' be 
a partial realization from a Gaussian process with mean vector fi and covariance matrix 
£, the likelihood function is given by 

L(z; 0) = (27T)-/ 2 | £ r 1 / 2 exp j-i(z - /ifE" 1 ^ -/*)}. 

Following Sanso and Guenni (2004), we write the likelihood function in terms of rj = 
r 2 /a 2 , such that £ = r 2 (/ n + r/ _1 r2(<5)), where I n represents a n-dimensional identity 
matrix, 8 = (61,62, ■ ■ ■ ,8k) comprises the parameter vector of the kernel function, and 
Q(8) is the associated resultant covariance matrix. Then the parameter vector to be 
estimated is given by 6 = (11, r 2 , rj, 8). 

Inference is performed under the Bayesian paradigm, and we assume the components 
of the parameter vector 6 to be independent a priori. Generally, one assumes = 
Q(s)'f3, where Q(-) is a p-dimensional vector with covariates that might influence the 
mean of Z(-), and (3 is a p-dimensional parameter vector. It is common practice to assign 
independent zero mean normal prior distributions with some fixed large variance to each 
(3j, j = 1, • • • ,p. For t 2 and 7] we assign independent, non-informative, inverse gamma 
prior distributions with parameters equal to 0.1. The elements of 8 are model dependent. 
The Supplementary Material discusses the prior specification of the parameters in 8 under 
each of the models proposed in the previous section. 

Regardless of the fitted model, the resultant posterior distribution, ir(0 \ z), does 
not have a closed form, and we resort to MCMC algorithms to obtain samples from the 
target distribution. In particular, we make use of the Gibbs sampler with some steps of 
the Metropolis-Hastings algorithm. The full conditional posterior distributions for each 
of the models are shown in Section 2 of the Supplmentary Material. 



3.1 Predictive inference 

We now discuss how to obtain predicted values at unmonitored locations of interest. 
Let z* = (z(s n+ i), z(s n+q )) be a vector representing the value of the process at q un- 
monitored locations of interest. Samples Z(s) are being generated from the multivariate 
normal distribution, N(fj,, £), the posterior predictive distribution, p(z*|z), is given by 

p(z*|z) = [ p(z*\z,0)7i(0\z)dd. (12) 
Je 

From the theory on the multivariate normal distribution, the joint distribution of (Z, Z*), 
conditional on 6, follows a multivariate normal distribution, 




Therefore, based on the properties of the partition of the multivariate normal distribution, 

(Z* \Z, 6)~N (fJL B + Y^BA^aa (z ~ fJL A ) , ^BB - E BA E^ A E AB ) . (13) 
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The integration in (12) does not have an analytical solution, however approximations 



can be easily obtained through Monte Carlo methods (Gamerman and Lopes 2006). For 
each sample /, / = 1, • • • ,L, obtained from the MCMC algorithm, we can obtain an 



approximation for (12), by sampling from the distribution in (13) and computing 



1 L 



(14) 



l=i 



The approximation above is also suitable for model comparison. Usually, one holds a 
set of observations out from the inference procedure, say z*, and uses the predictive 
likelihood to compare all fitted models to check under which model the actual observed 



values are more likely to be generated from. Greater values of p(z*|z) in (14) point to 
the best model. 



Interpolating the wind field Both models proposed in Sections 2A_ and 2J2 require 
the measurements of the components of wind direction, x(.) at the same locations where 
the process Z(.) is observed. The approach proposed in Section 2.1 requires that x(.) 
and Z(.) are observed at the same locations. On the other hand, the parameterization 
of the kernel in equation (10) requires that the components of x(.) are observed at the 
same locations of the grid points, {hi, • • • , h m }. If these observations are not available 
we propose to model x(.) and obtain interpolated values for unmonitored locations of 
interest. 

We suggest that spatial interpolation of x(s) follows Wikle et al. (2001) who, based 
on physical grounds, assume prior independence between the components u(s) and v(s) 
of x(.s). Therefore, we assume that the first coordinate of the vector x(.), follows a 
Gaussian process {U(s) : s G G}, with mean \i u and exponential covariance function, 

Cov(U(s),U(s*)) = u„exp (~ l|s ^f ^ ) s * e ^. And the second coordinate of the 

vector x(s) follows another, conditionally independent, Gaussian process {V(s) : s G G}, 

with mean \x v and exponential covariance function, Cov(V(s), V(s*)) = <j\ exp 



Vs, s* G G. We fit these independent models to each of the components of the wind field, 
and obtain estimated values ofu(-) and v (■) at all unmonitored locations of interest. The 
inference procedure for the parameters in the model for Z(.) follow conditioned on these 
interpolated values. 



3.2 Model Comparison 

In Section [4] we fit different models to the ozone levels described in Section 11.11 We 
briefly review two model comparison criteria: the Deviance Information Criteria (DIC), 
the Posterior Predictive Loss Criterion. See Banerjee et al. (2004) for more discussion 
about these criteria. 



Deviance Information Criterion Spiegelhalter et al. (2002) propose a generalization 
of the AIC based on the posterior distribution of the deviance, D(0) = — 21ogL(z; 0). 



13 



The Deviance Information Criterion (DIC) is defined as 

DIC = D+p D = 2D- D(0), 

where D defines the posterior expectation of the deviance, D = Eg\ z (D), and pr> is the 
effective number of parameters, pn = D — D(0) and here represents the posterior 
mean of the parameters. Smaller values of DIC indicate a better fitting model. Note 
that computation of DIC is easily achieved through MCMC methods. 

Posterior Predictive Loss Criterion An alternative to DIC is the posterior predic- 
tive loss (PPL) introduced by Gelfand and Ghosh (1998). This measurement is based on 
replicates of the observed data, Z^ rep I — 1, • • • , n, and the selected models are those that 
perform well under a so-called loss function. This loss function penalizes actions both 
for departure from the corresponding observed value as well as for departure from what 
we expect the replicate to be (Banerjee et al. 2004). The criterion is computed via 

where G = J2?=i{^i ~ ^ obs ) 2 and P = Y17=i a i- Here \i x = E(Z^ rep | z) and of = 
Var(Z^ rep | z), denote the mean and variance of the predictive distribution of Zi^ rep 
given the observed data z. Gelfand and Ghosh (1998) mention that ordering of models 
is typically insensitive to the choice of k, therefore we fix k = 1. Notice that at each 
iteration of the MCMC we can obtain replicates of the observations given the sampled 
values of the parameters. 

4 Data Analysis 

In this section, we illustrate the proposed models using measurements of ozone ob- 
served at 3pm, on November, 11th, 2008. The monitoring locations and the observed 
values are shown in Figure [TJ We fit 5 different models to this dataset. The aim is to 
compare how the different models behave in terms of model fitting, spatial interpolation 
and, specially, the associated uncertainty of the spatial interpolation. 

Fitted Models The notation of the fitted models is the following: 
Ml Isotropic model with Matern covariance function 
M2 Elliptical anisotropic model with Matern covariance function, that is 

i / \ IM| T A \\u\\ \ / a/IImII 21 A ||?d| \ 
Cov(Y( Si ),Y( Sj )) = a 2 (2^ 1 T(u))- 1 1 k v ' , 

where 

cos 9 sin 9 
— sin 9 cos 9 ' 



A 



cos 9 — sin 9 
sin 9 cos 9 



Ai 
A 2 
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where < 9 < tt/2, because of unidentiflability reasons, denotes the modified 
Bessel function of the third type and order z/, Y is the usual Gamma function, ip > 
is a parameter related to the decay of the correlation as the distance between the 
locations increases and v > determines the smoothness of the process. Ml is a 
particular case of M2 when 9 = and Ai = A2 = 1. 

M3 Nonstationary Matern covariance function with E(s,x) as in equation ([7]) 

M4 Covariance function based on the Projection model as presented in equation <\9 



M5 Nonstationary Matern covariance function with S(s) as proposed in Section 2.3 



For all models we fixed the smoothness parameter of the Matern covariance function (u) 
at 1. For models Ml to M4 we let the MCMC run for 50,000 iterations, considered 10,000 
as burn in, and kept every 10-th iteration, to reduce autocorrelation within the chains. 
For model M5 we let the chain run for 700,000 iterations, considered 100,000 as burn in, 
and kept every 100-th iteration. Table [T] shows the computational time needed to run 
each of the models in an Intel(R) Core(TM)2 Quad CPU Q9550 2.83GHz computer with 
4 GB of RAM. Although M5 is a more flexible model, as it is able to capture any source 
of nonstationarity in the process, it requires a much longer chain to attain convergence. 
Moreover, when implementing the MCMC for model M5, it is really challenging to tune 
the variance of the proposal distributions of the parameters involved in the modelling of 



as proposed in Section 2.3 





Computational 


No. Iterations 


Model 


time 


minute 


Ml 


7 min 


7142.85 


M2 


1 h 26 min 


581.39 


M3 


1 h 55 min 


434.78 


M4 


21 min 


2380.95 


M5 


144 h 30 min 


80.73 



Table 1: Computational time, and number of iterations per minute, to run the MCMC 
algorithm for 50,000 iterations for models Ml, M2, M3, M4, and 700,000 iterations for 
model M5 in an Intel(R) Core(TM)2 Quad CPU Q9550 2.83GHz computer with 4 GB 
of RAM. 

Table [2] shows the values of four different model comparison criteria, DIC, PPL, the 
predictive likelihood based on the circled locations shown in Figure [TJ and the mean 
squared error (MSE). The MSE was computed using the mean of the posterior predictive 
distribution as the fitted value. Clearly, models Ml and M2 result in the worst perfor- 
mance in terms of PPL and DIC. This is an indication that the assumption of isotropy or 
elliptical anisotropy is not reasonable for this dataset. On the other hand, M4 results in 
the smallest values of PPL and DIC, followed by model M5, and M3. When considering 
the predictive likelihood based on the locations left out from the inference procedure, 
model M5 does not perform as well as models M3 and M4. But this result might be 
sensitive to the set of locations which were left out from the inference procedure. 
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iviociei 


PPL 


DIC 


Predictive 


TV/rQTT 1 
IVlori/ 


G 


P 


D l 


D 


Pd 


DIC 


likelihood 


Ml 


458.77 


1395.79 


1625.18 


326.06 


3.04 


329.10 


2.33 x 10~ 08 


131.10 


M2 


251.37 


1050.26 


1175.95 


318.70 


-1.80 


316.37 


6.18 x 10~ 08 


100.53 


M3 


87.80 


639.06 


682.96 


309.46 


3.88 


313.34 


6.49 x 10~ 07 


45.33 


M4 


90.38 


464.93 


510.12 


289.79 


4.03 


293.82 


4.93 x lO" 06 


25.80 


M5 


59.97 


539.60 


569.59 


302.29 


3.03 


305.32 


9.98 x 10~ 08 


70.46 



Table 2: Model comparison criteria: PPL, DIC, the predictive likelihood based on the 
circled locations in Figure [TJ and the mean squared error, under each fitted model. 



Figure [4] presents the posterior summary of the variance components under each 
fitted model. The aim is to compare the estimated values of the variance of the spatial 
component a 2 and the nugget effect, r 2 . Clearly, the models with more flexible covariance 
functions, M3, M4 and M5, result in the smallest values of the nugget effect. And in 
particular, model M4 results in the smallest value of r 2 . This suggests that there is less 
structure left in the measurement error under model M4 when compared to the other 
models. On the other hand, models Ml and M2, the simplest ones, result in the largest 
values of r 2 , suggesting that there is some structure left in the data that the spatial 
component of these respective models is not able to capture. 
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Figure 4: Posterior summary of the variance of the spatial process (a 2 ) and the nugget 
effect (r 2 ) under each of the fitted models M1-M5. 

Figure [5] shows the posterior mean of the resultant ellipses obtained under models 
M3 and M5. This helps to understand how the spatial correlations are changing across 
the region under models M3 and M5. Clearly, the ellipses under model M3 follow the 
direction of the wind field. Although model M5 does not use any information about the 
wind field, at some parts of the region, the direction of the ellipses resemble the observed 
wind field (e.g. see the west and the middle portions of the region). 



The behavior of the proposed correlation functions in Sections 2A_ an d 2J2 can be 
visualized through the panels of Figure [6j The first row presents the estimated correlation 
for three different points, with all the others in the grid under model M3. And the panels 
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Figure 5: Left panel: wind vectors and respective estimated ellipses under model M3. 
Right panel: Estimated ellipses under model M5. In both panels the gray scale is asso- 
ciated with the observed values of ozone. The ellipses were scaled to ease visualization. 



of the second row show the estimated correlation for the same points under model M4. 
From these panels it is clear that the proposed models are able to capture the influence 
of the wind field on the estimated correlation function. 

Figure [7] presents the mean and standard deviation surfaces of the posterior predictive 
distribution of ozone for unmonitored locations formed by a regular grid superimposed 
over the region. For models M3 and M4, we interpolated the wind components, U(.) and 
V(.), based on the fitting of independent Gaussian processes as discussed in Section 3.1 



The plot of the interpolated wind field can be seen in Figure 3 of the Supplementary 
Material. Now, the effect of the estimates of a 2 , r 2 , and the correlation structure of the 
spatial process Y(.) under each fitted model is clearer. The spatial distribution of the 
standard deviations of the predictive distribution differ a lot across the models (second 
row of Figure [7]). Models Ml and M2 tend to result in the biggest values of the standard 
deviations, while model M4 results in the smallest values, followed closely by model M3. 
The mean of the predictive distribution across the models are quite similar, with some 
differences in the southeast portion of the region. 

Finally, Figure [8] shows the posterior summary of the fitted values versus the observed 
ones. Clearly, despite of model M4 being much simpler than model M5, both tend to 
present very similar results in terms of model fitting, followed closely by model M3. From 
these panels the difference between the proposed models, M3, M4 and M5, and models 
Ml and M2 becomes clearer. Note that the ranges of the 95% credible intervals under 
models Ml and M2 are much wider than those obtained under models M4, M5, and M3. 
This is another indication of the gain in considering more flexible covariance functions 
when modelling this spatial process. 
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Figure 6: Posterior mean of the estimated correlation between the location marked by a 
circle and all the others in the grid, under models M3 (first row) and M4 (second row). 
Arrows indicate the direction at which the wind is blowing at the monitoring locations. 
The wind information at unmonitored locations was obtained through the spatial inter- 
polation described in Section 3.1 and panel shown in Section C of the Supplementary 
Material. 
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Figure 7: Surfaces of the posterior mean (first row) and standard deviations (second row) 
of the predictive distribution for unmonitored locations under each of the fitted models, 
M1-M5. 
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Figure 8: Posterior summary, mean (solid circle) and limits of the 95% credible intervals 
(vertical line), of the observed values versus the fitted ones, under each fitted models, 
M1-M5. 
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5 Discussion 



Wind direction influences many environmental processes of interest. We suggested 
two different approaches to consider wind information in the covariance structure of 
environmental processes observed at fixed locations. The challenge is how to account 
for the influence of the wind direction while still guaranteeing that the resultant spatial 
covariance structure is positive definite. Borrowing ideas from the convolution approach 
of Higdon et al. (1999), and Paciorek and Schervish (2006), we proposed to include the 
directional covariate in the kernel function of their convolution approach as discussed in 
section [2j 

In particular, in Section 2A_ we described how to introduce information about wind 
in the nonstationary Matern covariance function proposed by Paciorek and Schervish 
(2006). Although the resultant covariance function between two locations accounts for 
wind information at these locations, it fails to account for the direction at which the 
winds are blowing at each point. To overcome this possible drawback, we proposed an 
alternative approach in Section < 2J2_. This proposal is based on an exponential kernel which 
depends on the Euclidean distance between locations and on the norm of a projection 
quantity which indicates how concordant two locations are in terms of the mean direction 
at which the wind is blowing at these locations. As the proposed kernel does not result 
in a closed form of the covariance function, we propose to use the discretized version of 
the convolution approach, as described in equation (pi). 

The use of the directional covariate significantly reduces the number of parameters to 
be estimated, while still allowing for flexibile covariance structures. In order to compare 
the performance of the proposed models in Sections |2.1 and 2.2[ Section 2.3 proposed an 
alternative parameterization of the spectral decomposition of S(s) in equation ([3]). Under 
this approach, different from Paciorek and Schervish (2006), we avoid the use of arbitrary 
constants, which are problem specific and difficult to fix when modelling S(s). When 
compared to the models of Sections 2.1 and 2.2, this is a more flexible model, because 
it is able to capture different sources of nonstationarity. However, the great number of 
parameters to be estimated lead to some difficulties when obtaining samples from the 
resultant posterior distribution. More specifically, the chains of the parameters in S(s) 
take very long to converge, requiring careful tunning of the variance of the proposal 
distributions in the Metropolis-Hastings step. 

In Section [4] we fit five different models to ozone data observed at a particular time of 
a day in the East region of the USA. As expected, the MCMC algorithm to fit the model 
in Section 2J3 requires significantly longer chains to provide evidence of convergence. For 
this particular dataset, the proposed models of Sections |2. 1| and [2~2] perform better than 
standard models in the geostatistics literature. Moreover, although they have very few 
parameters to be estimated, they result in quite comparable fits to those obtained under 
the more flexible model of Section 2.3, as shown in Tableland Figure [8] 

The proposed model of Section 2A_ requires the observation of wind at the same 
locations where the process of interest Z(.) is observed. On the other hand, the model 
in Section 2J2 requires the wind information at the points of the regular grid used to 
approximate the continuous process as proposed in equation Q. If the wind observations 
are not available at the locations of interest, one can perform spatial interpolation of the 
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wind field as suggested in Section 3.1| It is common that wind information is available 
from satellite measurements which usually are available in the form of a grid of locations. 
When this is the case, we suggest to use this grid as the approximating grid of the model 
in Section I2T21 



The kernel functions proposed in Sections |2.1| and [2.2| can be used in any other context 
for which a directional covariate might influence a spatial process of interest. For example, 
when modelling a process observed at locations over an ocean, one can fit models with the 
kernel functions proposed here to investigate the influence of sea current on the spatial 
covariance function of such processes. 

Introducing covariates in the covariance structure of spatial processes seem to pro- 
vide reasonably flexible models, while significantly reducing the number of parameters 
to be estimated. From our experience, it is important to understand well the pro- 
cess of interest in order to propose which covariate, if any, might be used to obtain 
a flexible spatial covariance structure. In particular, when introducing a directional co- 
variate, such as the wind field, some care must be taken. In order to visualize better 
what kind of fields, and correlations the covariance functions we propose here can pro- 
vide, we made available, in the web link http : //www .uf jf .br/joaquim_neto/ensino/| 
materiais/convolucoes-convolutions/, R-TclTk softwares that produce graphical 
correlations, covariances and simulations from the proposed models given values of the 
parameters specified by the user. We believe this tool might help the practitioner to 
check if the proposed models are able to reproduce what is expected to be observed in 
practice. 



Supplementary Materials 

The reader is referred to the online Supplementary Materials for technical appendices. 
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